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Abstract 


The  modified  Cholesky  factorization  of  Gill  and  Murray  plays  an  important  role  in  optimi¬ 
zation  algorithms.  Given  a  symmetric  but  not  necessarily  positive  definite  matrix  A ,  it  computes 
a  Cholesky  factorization  of  A+E ,  where  £=0  if  A  is  safely  positive  definite,  and  E  is  a  diagonal 
matrix  chosen  to  make  A+E  positive  definite  otherwise.  The  factorization  costs  only  a  small 
multiple  of  n2  operations  more  than  the  standard  Cholesky  factorization.  We  present  a  new  algo¬ 
rithm  that  has  tnese  same  properties,  but  for  which  the  theoretical  bound  on  1 1 E  |  |  is  substan¬ 
tially  smaller.  It  is  based  upon  two  new  techniques,  the  use  of  Gerschgorin  bounds  in  selecting 
the  elements  of  E ,  and  a  new  way  of  monitoring  positive  definiteness.  In  extensive  computa¬ 
tional  tests  on  indefinite  matrices,  the  new  factorization  virtually  always  produces  smaller  values 
of  1 1£  1 1  than  the  existing  method,  without  impairing  the  conditioning  of  A+E.  In  some  cases 
the  improvements  are  substantial.  The  new  factorization  may  prove  useful  in  optimization  algo¬ 
rithms. 


1.  Introduction 


The  modified  Cholesky  factorization  was  introduced  by  Gill  and  Murray  [1974],  and  subse¬ 
quently  refined  by  Gill,  Murray,  and  Wright  [1981].  Given  a  symmetric,  not  necessarily  positive 
definite  matrix  A  e  ,  it  calculates  a  Cholesky  (i.e.  LLT ,  or  equivalently  JJDLT)  factorization 
of  A  +E ,  where  £  is  0  if  A  is  safely  positive  definite,  and  £  is  a  non-negative  diagonal  matrix  for 
which  A  +E  is  positive  definite  otherwise.  When  A  is  not  positive  definite,  there  is  an  a  priori 
error  bound  on  how  large  E  can  be  as  a  function  of  A  ;  the  practical  intent  is  that  £  not  be  much 
larger  than  is  necessary  to  make  A+E  positive  definite.  The  factorization  uses  only  about  n2l 2 
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more  operations  than  the  nonmal  Cholesky  factorization,  which  costs  approximately  each 
multiplications  and  additions. 

The  modified  Cholesky  factorization  has  become  very  important  in  optimization  algo¬ 
rithms.  Its  primary  use  is  in  line  search  methods  for  unconstrained  optimization,  where  it  is  used 
to  generate  a  descent  search  direction  when  the  Hessian  matrix  is  not  positive  definite  (see  e.g. 
Gill,  Murray,  and  Wright  [1981]).  It  is  also  used  in  line  search  methods  for  constrained  optimiza¬ 
tion  problems  (Gill,  Murray,  and  Wright  [1981]),  and  in  some  trust  region  methods  (Dennis  and 
Schnabel  [1983]). 

This  paper  presents  a  new  modified  Cholesky  factorization  algorithm  that  is  intended  for 
the  same  purposes  as  the  current  method.  The  new  algorithm  still  costs  only  a  small  multiple  of 
n2  operations  more  than  the  standard  Cholesky  factorization.  It  possesses  a  much  smaller  a 
priori  bound  on  the  size  of  the  diagonal  matrix  £,  and  in  extensive  computational  tests,  1 1£  1 1 
almost  never  is  larger,  and  in  many  cases  is  considerably  smaller,  than  that  generated  by  the  algo¬ 
rithm  of  Gill,  Murray,  and  Wright.  In  fact,  when  A  is  not  positive  definite,  1 1£  1 1  is  usually 
close  enough  to  the  negative  of  the  smallest  eigenvalue  of  A  that  the  new  algorithm  may  be  a 
usefui.  inexpensive  way  to  estimate  this  eigenvalue. 
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The  remainder  of  this  paper  is  organized  as  follows.  Section  2  contains  a  brief  summary  of 
the  motivation  and  uses  for  the  modified  Cholesky  factorization  in  optimization  algorithms.  Sec¬ 
tion  3  summarizes  the  goals  of  this  factorization  and  the  basic  challenges  that  it  presents,  and  sec¬ 
tion  4  briefly  describes  the  Gill.  Murray,  and  Wright  [1981]  algorithm. 

In  Section  5  we  present  the  new  algorithm.  It  contains  two  main  novel  features,  the  use  of 
Gerschgorin  bounds  in  determining  both  the  pivot  sequence  and  the  elements  of  E ,  and  a  new 
two-phase  strategy  for  determining  when  a  matrix  is  not  positive  definite  and  needs  to  be  per¬ 
turbed.  In  Section  6  we  present  the  results  of  an  extensive  computational  comparison  of  the 
behavior  of  the  new  and  old  factorizations  on  indefinite  test  matrices  of  dimensions  25  to  75. 
Section  7  contains  some  brief  conclusions. 

Throughout  the  paper  we  consider  the  Cholesky  factorization,  i.e  the  factorization  into 
LLT ,  where  L  is  lower  triangular,  as  opposed  to  the  LDLT  factorization,  where  L  is  unit  lower 
triangular  (ones  on  the  diagonal)  and  D  is  a  positive  diagonal  matrix.  The  conclusions  of  the 
paper  are  true  for  either  factorization.  We  use  the  Cholesky  because  we  believe  it  makes  the 
exposition  simpler.  We  use  the  version  of  the  Cholesky  factorization  that  makes  a  rank  one 
change  to  the  remaining  submatrix  at  each  iteration  (analogous  to  Gaussian  elimination),  rather 
than  the  version  that  delays  the  changes  to  any  element  until  it  is  in  the  pivot  column  (analogous 
to  Crout  reduction).  The  use  of  the  first  version  will  be  seen  in  Section  5  to  be  important  to  our 
algorithm. 

An  important  piece  of  notation,  used  throughout  the  paper,  is  that  we  use  Aj  to  denote  the 
principal  submatrix  that  remains  to  be  factored  at  the  start  of  the  jth  iteration  of  the  factorization. 
Thus,  Aj  is  an  n  +  l-j  x  n+\-j  matrix,  consisting  of  the  values  that  reside  in  rows  and  columns  j 
through  n  at  the  start  of  the  j,h  iteration.  For  consistency,  A  i  is  the  original  matrix  A.  This 
notation  is  expanded  in  Section  3. 
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2.  The  Use  of  the  Modified  Cholesky  Factorization  in  Optimization  Algorithms 


The  modified  Cholesky  factorization  was  introduced  by  Gill  and  Murray  [1974]  in  the  con¬ 
text  of  a  line  search  method  for  solving  the  unconstrained  optimization  problem 

minimize  /  :  Rn  — >  R  . 

x  e  R" 

Unconstrained  optimization  methods  generally  base  each  iteration  upon  the  quadratic  model  of 
f  (x)  around  the  current  iterate  xc 

m(xc+d)=f(xc)  +  Vf(xc)Td  +  'AdTHcd  ,  (2.1) 

where  Hc  is  the  Hessian  matrix  V2/  ( xc )  or  a  symmetric  approximation  to  it.  If  Hc  is  positive 
definite,  then  the  step  dc  =  -//C~'V/  ( xc )  is  the  minimizer  of  (2.1)  and  also  a  descent  direction  for 
f  (x ),  so  that  a  satisfactory  next  iterate  x+  always  can  be  found  by  choosing  x+  =  xc  -  Xcdc  for 
some  kc  >  0.  If  Hc  has  one  or  more  negative  eigenvalues,  however,  then  the  model  (2.1)  is 
unbounded  below,  and  the  direction  dc  =  -Hc~l  V/  (*c)  may  or  may  not  be  a  descent  direction  for 
/ (x).  In  this  case,  Gill  and  Murray  [1974]  suggested  calculating  dc  =  -{Hc+Ec)~x V/ (xc)  as  the 
search  direction,  where  Hc+Ec  is  positive  definite,  and  again  choosing  x+  =  xc  -  \cdc  for  some 
kc  >  0  by  a  line  search  procedure.  By  standard  convergence  results,  if  [  \  HC  [  |  is  uniformly 
bounded  above,  1 1  Ec  1 1  is  bounded  above  as  a  function  of  \  \HC  j  j ,  and  the  condition  number  of 
Hc+Ec  is  uniformly  bounded  above,  then  the  sequence  of  iterates  generated  by  a  standard  line 
search  method  that  uses  such  search  directions  will  be  globally  convergent  in  the  sense  that  the 
limit  of  the  sequence  of  gradients  converges  to  zero.  If  E  =  0  when  H  is  positive  definite,  then 
the  method  will  also  be  quadratically  convergent  in  the  neighborhood  of  a  strong  local  minimizer. 
(See  Dennis  and  Schnabel  [1983]  fora  summary  of  these  results.) 

The  algorithm  of  Gill  and  Murray  [1974]  for  choosing  Ec  satisfies  all  the  aforementioned 
conditions  on  Ec .  It  also  is  very  efficient  in  that  it  calculates  either  the  Cholesky'  factorization  of 
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Hc  if  it  is  positive  definite,  or  the  Cholesky  factorization  of  Hc+Ec  otherwise,  at  barely  a  higher 
total  cost  than  a  standard  Cholesky  factorization,  without  knowing  a  priori  whether  Hc  is  positive 
definite  or  not.  For  these  reasons,  it  has  become  a  standard  technique  in  line  search  methods  for 
unconstrained  optimization  problems.  The  algorithm  was  refined  somewhat  subsequent  to  its 
introduction,  and  a  modem  version,  that  has  performed  very  well,  is  given  in  Gill,  Murray,  and 
Wright  [1981]. 

The  modified  Cholesky  factorization  is  also  used  in  some  line  search  methods  for  solving 
constrained  optimization  problems.  Some  algorithms  for  solving  such  problems  also  generate  a 
sequence  of  unconstrained  quadratic  models,  and  if  the  Hessian  of  any  such  model  is  not  positive 
definite,  the  same  techniques  are  applicable.  For  more  details,  see  e.g.  Gill,  Murray,  and  Wright 
[1981], 

Some  trust  region  methods  for  optimization  problems  also  use  the  modified  Cholesky  fac¬ 
torization.  Whale  we  will  not  elaborate  upon  these  methods  here,  in  some  of  them  it  is  useful  to 
have  an  upper  bound  on  the  most  negative  eigenvalue  of  Hc ,  and  the  norm  of  the  matrix  Ec 
from  the  modified  Cholesky  factorization  serves  this  purpose.  Dennis  and  Schnabel  [1983]  dis¬ 
cuss  trust  region  methods  that  incorporate  the  modified  Cholesky  factorization,  and  Shultz, 
Schnabel,  and  Byrd  [1985]  show  how  to  construct  efficient  and  globally  convergent  trust  region 
methods  if  a  satisfactory  upper  bound  on  Xi  is  available.  The  methods  described  in  this  paper 
produce  bounds  that  are  satisfactory  in  this  sense. 

Our  general  reasons  for  pursuing  a  new  modified  Cholesky  factorization  algorithm  were 
given  in  Section  1  and  are  elaborated  further  at  the  end  of  Section  4.  In  addition,  from  an  optimi¬ 
zation  perspective,  the  new  method  may  lead  to  a  new  efficient  and  simple  implementation  of 
trust  region  methods.  Wc  discuss  this  possibility  briefly  in  Section  7. 


5 


3.  Goals  and  Challenges  of  the  Modified  Cholesky  Factorization 

Given  a  matrix  A  e  Rnxn  that  is  symmetric  but  not  necessarily  positive  definite,  the  objec¬ 
tive  of  the  modified  Cholesky  factorization  is  to  construct  a  Cholesky  ( LLT )  factorization  of  a 
positive  definite  matrix  A+E ,  where  E  is  a  non-negative  diagonal  matrix.  More  specifically,  the 
factorization  has  the  following  four  goals  :  1)  If  A  is  safely  positive  definite,  E  should  equal  0  ; 
2)  If  A  is  indefinite,  1 \E  1 1  should  not  be  much  greater  than  -Xi(A),  where  X.i(A)  is  the  most 
negative  eigenvalue  of  A ;  3 )  A+E  should  be  a  reasonably  well  conditioned  matrix,  and  4)  the 
cost  of  the  factorization  should  only  be  a  small  multiple  of  nr  operations  more  than  the  cost  of 
the  normal  Cholesky  algorithm. 

One  obvious  way  to  select  E  would  be  to  find  X](A),  and,  if  X](A)  <  0,  let  E  equal 
[— \i(.4 )  +  e]  / ,  for  some  small  positive  e.  This  would  satisfy  the  first  3  goals,  but  the  expense  of 
finding  the  eigenvalues  of  a  matrix  exceeds  the  cost  requirements  specified  in  our  final  goal  by  at 
least  an  order  of  magnitude.  Thus  the  major  challenge  in  developing  a  modified  Cholesky  factor¬ 
ization  is  to  satisfy  the  first  3  goals  while  not  increasing  the  cost  by  more  than  0(n2).  Among 
other  things,  this  implies  that  a  one  pass  algorithm  is  essential. 

There  is  a  basic  tradeoff  in  deciding  upon  the  size  of  each  of  the  diagonal  elements  of  the 
matrix  E .  Let  the  n+\-j  x  n+l-j  principal  submatrix  remaining  to  be  factored  at  the  j,h  itera¬ 
tion,  consisting  of  the  current  elements  in  rows  j  through  n  and  columns  i  through  n  ,  be  denoted 


where  a }eR  is  the  current  j'h  diagonal  element,  aj&Rn~’  is  the  current  vector  of  elements  in 
column  j  below  the  diagonal,  and  AJeR('l'J)y(-n~J\  (We  will  use  the  convention  that  the  sub¬ 
scripts  of  the  elements  in  the  vector  at  arc  i  =  j  + 1  through  n,  so  that  (a; ),  =  A,y,  i=j+ 1,  •  ■  •  ,n). 
Then  at  the  j,h  iteration,  the  normal  Cholesky  factorization  algorithm  computes  L;;  =  ya;,  LtJ  = 
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(aj),/Ljj,  i=j+l,  •  •  ,n,  and  (assuming  the  changes  to  the  remaining  elements  are  not  deferred) 

A  —A  —  a’ 


In  the  modified  Cholesky  factorization,  the  computations  are  instead  Ljj  =  \a;  +bj ,  LtJ  = 
(a; ),  /£,_,  ,  i=j+ 1 ,  •  •  ■  ,n ,  and 

a,aj 


xj+'-aj  a~rk~' 


where  8;  is  greater  than  or  equal  to  zero  and  is  the  jth  diagonal  element  of  the  matrix  £.  The 
tradeoff  between  making  5 i;  large  or  small  leads  to  the  following  dilemma.  If  a;  is  negative  and 

a  ■  qJ 

8.  is  chosen  so  small  that  a.  +8,  is  barely  greater  than  0,  then  1  {  will  be  large,  and  Aj+\  will 

aj  +0j 

have  large  negative  eigenvalues,  implying  that  the  elements  of  £  in  some  remaining  iterations 
will  need  to  be  large.  On  the  other  hand,  if  8;  is  large,  then  we  have  already  added  a  large  amount 
to  the  diagonal.  The  challenge  lies  in  adding  the  appropriate  amount  to  the  diagonal  of  A  at  the 
appropriate  time  in  the  algorithm.  This  requires  that  the  algorithm  consider  more  information 
than  just  the  value  of  a;  in  chosing  5j.  It  will  be  seen  in  Sections  4  and  5  that  considering  the 
values  of  a;  as  well  as  is  sufficient  to  produce  effective  modified  Cholesky  factorization  algo¬ 
rithms  in  both  theory  and  practice. 


4.  The  Modified  Cholesky  Factorization  of  Gill,  Murray,  and  Wright 

Gill,  Murray,  and  Wright  [1981]  give  a  modified  Cholesky  factorization  algorithm  that  is 
designed  to  satisfy  the  four  goals  stated  at  the  start  of  Section  3.  Given  a  symmetric  but  not 
necessarily  positive  definite  matrix  A  e  Rnxn,  it  computes  an  LDLT  factorization  of  a  matrix 
A  +E ,  where  £  is  a  non-negative  diagonal  matrix.  In  this  section,  we  briefly  review  their  method. 
To  be  consistent  with  the  remainder  of  the  paper,  we  restate  their  algorithm  in  terms  of  the 
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Cholcsky  (LLJ)  decomposition.  This  does  not  change  any  of  the  important  properties  of  the 
algorithm  that  we  discuss. 


At  each  iteration,  the  algorithm  of  Gill,  Murray,  and  Wright  first  selects  the  maximum  (in 
absolute  value)  diagonal  element  in  the  remaining  principal  submatrix  Aj,  and  pivots  it  to  the  top 
left  position  by  interchanging  its  row  and  column  with  the  pivot  (J,h)  row  and  column,  respec¬ 
tively.  Then,  if  Aj  is  now  the  pivoted  principal  submatrix,  with 


A,= 


O.J  a  J 

aj  A, 


(4.1) 


where  a;  is  the  diagonal  element  in  the  pivot  column  and  aj  is  the  remainder  of  the  pivot 
column,  the  elements  of  the  ne>  t  principal  submatrix  AJ+\  are  computed  by 


4+1=/t-id- 

J+i  J  a.j  +6j 


(4.2) 


The  value  of  5 ;  at  each  iteration  is  chosen  to  be  the  smallest  non-negative  number  such  that 


0^  \  \0:  II.  <  o 

aj+dj  p  ’ 

where  P>0  is  an  a  priori  bound  selected  to  minimize  a  worst  case  bound  on  |  \E  \  |«,.  If  a;<0 
and  this  value  of  5  is  less  than  -2a; ,  then  5 y  =  -2 a;  instead. 


What  remains  to  be  described  is  the  choice  of  p.  Let  ^  =  the  maximum  magnitude  of  the 
off-diagonal  elements  of  the  original  matnx  A  ,  and  y=  the  maximum  magnitude  of  the  diagonal 
elements  of  A  .  Gill  and  Murray  [1974]  produce  an  error  bound  on  1 1  £  1 1 as  a  function  of  P  for 
their  algorithm,  and  show  that  it  is  minimized  when  P:  =  ^  /  ^n2- 1.  For  that  choice  of  p, 

1 1£  1 1„  <  2  +  (n-1) )  ^  +  2y ,  (4.3) 


or  roughly 


1 1  £  li-<4nq+2Y 


(4.4) 


for  moderate  to  large  n .  However  this  choice  of  p  may  cause  positive  definite  matrices  A  to  be 
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perturbed,  so  the  selection  of  P  is  adjusted  in  order  to  avoid  this.  Gill  and  Murray  [1974]  also 
show  that  the  choice  P  >  vy  guarantees  that  £  =  0  for  positive  definite  A  .  Thus  their  algorithm 
assigns  P2  to  be  the  maximum  value  of  y,  %  /  or  machine  epsilon.  If  y  >  %  /  Vn 2-l,  the 

usual  case,  then  the  error  bound  for  this  adjusted  P  becomes 

||£  |U<(n2+l)Y+2(n-l)4  +  ^/Y.  (4.5) 

which  is  larger  than  (4.3). 

The  modified  Cholesky  factorization  algorithm  of  Gill,  Murray,  and  Wright  [1981]  has  pro¬ 
ven  to  be  an  effective  factorization  in  the  context  of  optimization  algorithms,  and  as  will  be  seen 
in  Section  6,  does  quite  a  good  job  of  fulfilling  the  four  goals  stated  at  the  beginning  of  Section  3 
(The  cost  of  the  algorithm  is  approximately  n2  comparisons,  and  O(n)  arithmetic  operations, 
more  than  the  standard  Cholesky  factorization.)  It  should  be  noted  that  while  the  diagonal  pivot¬ 
ing  employed  by  the  a’gorithm  of  Gill,  Murray,  and  Wright  does  not  affect  the  analysis  described 
above,  it  is  very  important  to  its  good  practical  performance. 

There  appear  to  us  to  be  two  important  ways  in  which  the  algorithm  of  Gill,  Murray,  and 
Wright  [1981]  might  still  be  improved.  First,  the  bounds  (4.3)  and  particularly  (4.5),  which  are 
attained  by  the  algorithm  for  particular  matrices  A,  are  far  from  optimal,  as  will  be  discussed  in 
Section  5.  Secondly,  the  results  of  Section  6  show  that  in  practice,  the  value  of  1 1£  1 1„  pro¬ 
duced  by  the  algorithm  is  sometimes  many  times  too  large.  The  new  method  described  in  Sec¬ 
tion  5  primarily  attempts  to  improve  upon  the  algorithm  of  Gill,  Murray,  and  Wright  in  these  two 
regards. 
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5.  The  New  Modified  Cholesky  Factorization 


Our  new  modified  Cholesky  factorization  algorithm  incorporates  two  main  new  techniques. 
The  first  involves  using  Gerschgorin  Circle  Theorem  bounds  to  determine  the  elements  in  the 
non-negative  diagonal  matrix  £  that  is  added  to  an  indefinite  matrix  A  in  order  to  make  it  posi¬ 
tive  definite.  The  second  is  a  new  technique  for  assuring  that  one  does  not  perturb  an  already 
positive  definite  matrix,  i.e.  that  £=0  if  A  is  positive  definite.  In  Section  5.1  we  describe  the 
new  technique  that  uses  Gerschgorin  bounds  to  decide  how  much  to  add  to  the  diagonal,  and 
show  that  it  leads  to  an  improved  upper  bound  on  |  \E  |  |„.  In  Section  5.2  we  describe  the  new 
technique  for  assuring  that  a  positive  definite  matrix  is  not  perturbed,  and  show  that  unlike  the 
strategy  of  Gill,  Murray,  and  Wright  [1981],  it  can  be  incorporated  into  a  modified  Cholesky 
decomposition  algorithm  without  causing  the  bound  on  |  \  E  ]  |»  to  grow  significantly.  In  Sec¬ 
tion  5.3  we  describe  our  full  new  algorithm,  which  integrates  these  two  techniques,  discuss  its 
theoretical  properties,  and  give  a  simple  example  comparing  it  to  the  method  of  Gill,  Murray,  and 
Wright  [1981], 

5.1  Using  Gerschgorin  Circle  Theorem  bounds  to  determine  the  amounts  to  add  to  the 
diagonal 

In  this  section,  we  introduce  our  basic  strategy  for  choosing  a  non-negative  diagonal  matrix 
E  such  that  A  +£  is  positive  semi-definite.  (The  exposition  and  theory  are  cleaner  if  we  allow  the 
possibility  that  A+E  is  positive  semi- definite;  the  changes  to  assure  that  it  is  strictly  positive 
definite  are  small  in  practice  and  theory,  and  are  described  in  Section  5.3.)  The  strategy  described 
in  this  section  may  result  in  £  having  some  positive  elements  even  if  A  is  positive  definite;  the 
modifications  we  make  to  avoid  this  are  described  in  Section  5.2. 

The  Gerschgorin  Circle  Theorem  states  that  if  A  e  Rn*n  is  a  symmetric  matrix  with  eigen¬ 
values  >_!<  <Kn ,  then  each  X,  €  (G 1UG2U  •  ■  ■  uG„ },  where 
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C,  -  [Au  -  ^  | Aij  |  ,Au  +  2^  \Aij  |  }  =  [Glow,  ,  Gup,}  ,  i  =  1,  •  •  ■  ,rt.  (5.1.1) 

Jj*  Jj-u 

Thus,  since  A  -X[ I  is  positive  semi -definite,  an  upper  bound  on  the  amount  that  must  be  added  to 
the  diagonal  of  A  to  make  A  +£  positive  semi-definite  is 

Maxadd-GCT  *  max  (0  ,  -Glow, }  .  (5.1.2) 

An  objective  of  the  new  modified  Cholcsky  factorization  is  to  find  £  for  which  A  +£  is  positive 
semi-definite  and  for  which  we  can  guarantee 

||£  ||„  <  Maxadd-GCT  ,  (5.1.3) 

at  least  in  the  case  when  we  are  not  concerned  about  perturbing  a  positive  definite  matrix.  This 
bound  is  easily  achieved  as  indicated  by  the  following  lemma  and  theorem.  Note  that  since, 
using  the  notation  of  Section  4, 

Maxadd-GCT  <  y+(n-l)^  ,  (5.1.4) 

(5. 1 .3)  is  guaranteed  to  be  stronger  than  (4.3). 


Lemma  5.1.1.  Let  .4  e  Rnx*  have  the  Gerschgorin  Circle  Theorem  bounds  G,  ,  (=1,  •  •  •  ,  n  given 


in  (5.1.1).  Denote  A  = 


a  a  1 
a  A 


,  where  cteR,  aeRn  ',  A  e  R(n  Let  A  =  A  - 


aa‘ 

a+6 


have  Gerschgorin  Circle  Theorem  bounds  G,  ,  i=  2,  *  •  •  ,n,  where 


G;  = 


=  [4„  -  £  |  Aij  |  ,  Au  +  \AtJ  |  ]  g  [Glow,  ,  G  up,  ]  ,  i  =2,  •••  ,n. 


/=- 


/=- 

j** 


Then  if 


5  >  max{0,  |  \a  |  |  -  a  }  , 


(5.1.5) 


G,qG , ,  /=  2,  ■  ■  •  ,n . 


Proof.  Note  that  (5.1.5)  guarantees  a+5  >  0,  with  equality  possible  only  if  a= 0.  If  a- 0,  we  may 
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assume  that  we  set  A  =  A  so  that  the  lemma  is  trivially  true.  For  the  remainder  of  the  proof,  we 
assume  a+5  >  0. 


Let  us  again  use  the  convention  that  the  subscripts  of  the  vector  a  are  i  =  2  through  n ,  so 
that  a,  =  An,  i=  2,  •  ■  •  ,n .  Then  we  have 


row  i  of  A  =  row  i  of  A  - 


a  T  a, 
"a+5” 


i  =  2, 


Thus 


j ** 


^  ( 1 1  a  | )  j  -  |  a,  | )  |  a,  | 
-  - S+5 - 


(5.1.6) 


Also, 


(5"L7) 

Combining  (5.1.6)  and  (5.1.7),  recalling  that  the  term  An  =  a ,  is  present  in  G,  but  not  in  G,  ,  and 
using  5  >  ||  a  1 1  i-a  =  -  Glow  h  w'e  get 


Glow,  -  Glow,  >  |  a;  |  - 


llfl  111 \di\ 
5S5 


=  ~5-(5  +  (a-|k  II,))  =  -^-(5  +  G/ow,)  >0  , 
i  =  2,  •  •  • ,  n .  Similar  calculations  show  that 

G  upi  -  Gup,  <  -  (5  +  Glow ,  +  2  |a,  |)  <0  . 

Thus  G,  cC?, .  □ 

Lemma  (5.1.1)  shows  that  the  choice  (5.1.5)  causes  the  Gerschgorin  inteivals  to  contract 
Thus  it  is  almost  immediate  that  if  we  make  this  choice  with  equality  at  each  iteration  of  the 
modified  Cholesky  factorization,  we  will  satisfy  (5.1.3). 
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Theorem  5.1J.  Let  A  e  RnXA  have  the  Gerschgorin  Circle  Theorem  bounds  (5.1.1),  and  let 
Maxadd-GCT  be  defined  by  (5.1.2).  Suppose  that  at  each  iteration  of  the  modified  Cholesky 
factorization,  the  remaining  principal  submatrix  Aj  e  R(n+l~j )*(«+W)  is  given  by  (4.1),  (A  i=A ), 

5,  =  max{0,  ||  a}  1 1 1— oc, }  ,  (5.1.8) 

and  4;+i  e  is  calculated  by  (4.2).  Let  E  =  diag{5i,  •  ■  •  ,8„  ).  Then  A+E  is  positive 

semi-definite  and  (5.1.3)  is  true.  Furthermore,  if  any  diagonal  pivoting  strategy  is  used  at  each 
iteration  (i.e.  rows  and  columns  i  and  j  are  swapped  for  some  i>j),  (5.1.3)  remains  true. 

Proof.  The  proof  is  almost  immediate  from  Lemma  5.1.1.  Let  {G>)t,  i-j ,  •  ■  •  ,n  denote  the  Ger¬ 
schgorin  interval  obtained  from  row  i  of  Aj,  and  let  (G-'/ow),-  denote  the  lower  bound  of  (G;), . 
From  Lemma  5.1.1,  the  choice  (5.1.8)  assures  that 

(G->+llow)i  £(GJlow)i  ,  1  <j<i<n  .  (5.1.9) 

From  (5.1.8),  (5.1.9),  and  (5.1.2), 

5;  <  — (G; low)j  <  -Glcnvj  <  Maxadd-GCT  . 

This  completes  the  proof  of  the  first  part  of  the  theorem.  Since  diagonal  pivoting  of  a  symmetric 
matrix  only  permutes  its  Gerschgorin  intervals  but  does  not  alter  them,  and  since  Lemma  5.1.1 
and  the  above  part  of  this  proof  make  no  use  of  the  ordering  of  the  Gerschgorin  intervals,  the 
theorem  is  unaffected  by  any  diagonal  pivoting  strategy.  □ 

Our  algorithm  makes  one  further  modification  to  the  strategy  (5.1.8)  for  selecting  8;.  It  is 
that  we  require  the  amount  that  is  added  to  the  diagonal  at  iteration  j  to  be  at  least  as  great  as  the 
greatest  amount  that  has  been  added  to  the  diagonal  at  any  previous  iteration.  That  is, 

5 j  =  max{0,  | \aj  \  |  i-a;,  5;_i}  .  (5.1.10) 

It  is  straightforward  that  Theorem  5.1.2  remains  true  with  (5.1.10)  in  place  of  (5.1.8),  because  by 
induction  this  choice  still  satisfies  (5.1.3),  and  trivially  it  still  satisfies  (5.1.5). 
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The  rationale  for  this  modification  is  as  follows.  At  any  iteration,  suppose  8 i,  given  by 
(5.1.10)  is  larger  than  that  given  by  (5.1.8)  i.e.  max{0,  ||a;  1 1 1 — oty }  <  5j-\.  Then  the  new 
choice  (5.1.10)  doesn’t  change  the  value  of  1 1£  1 1«  at  this  point  in  the  algorithm,  because  8 = 
8;_i.  It  may  cause  subsequent  values  of  8,  to  be  smaller,  however,  because  it  results  in  a  larger 
a j+8j  and  hence  a  smaller  multiple  of  ajaJ  is  subtracted  from  Aj,  which  means  that  AJ+ 1  has 
larger  or  identical  eigenvalues  than  it  would  have  using  (5.1.8).  This  reasoning  does  not  imply 
that  the  final  value  of  1 1£  1 1«  will  be  smallerusing  (5.1.10)  than  using  (5.1.8),  but  it  makes  this 
seem  likely,  and  in  practice  the  modification  appears  to  be  helpful  in  some  cases  and  virtually 
never  harmful. 

The  total  additional  work  required  by  the  modifications  to  the  Cholesky  factorization 
described  so  far  in  this  section  is  approximately  n2/l  additions,  for  the  computation  of  |  \a.j  !  1 1  at 
each  iteration.  In  comparison,  the  additional  work  for  the  algorithm  of  Gill,  Murray,  and  Wright 
[1981]  is  approximately  n2/ 2  comparisons,  because  it  computes  |  |a;  1 1„. 

Finally,  as  noted  in  Section  4,  it  is  important  in  practice  to  use  a  diagonal  pivoting  strategy, 
even  though  it  does  not  affect  the  theoretical  results  given  above.  We  could  simply  pivot  based 
on  the  maximum  diagonal  element,  as  is  done  by  Gill,  Murray,  and  Wright  [1981].  However, 
recall  that  the  amount  we  add  to  the  diagonal  at  iteration  j  will  be  at  least  the  negative  of  the 
lower  Gerschgorin  bound  of  the  pivot  row  for  that  iteration.  This  suggests  that  we  instead  select 
as  pivot  row  (and  column)  the  row  (and  column)  for  which  the  lower  limit  of  the  Gerschgorin 
interval  is  largest.  If  this  Gerschgorin  bound  is  positive,  then  we  will  not  increase  1 1  £  1 1  „  at  this 
iteration,  and  the  Gerschgorin  intervals  will  contract. 

This  pivoting  strategy  assumes  that  the  Gerschgorin  bounds  for  each  remaining  row  are 
available  at  each  iteration.  This  would  require  a  total  of  approximately  n3/2  additional  additions, 
which  is  too  high.  An  alternative  is  to  pivot  based  on  the  estimates  of  the  Gerschgorin  bounds 
that  result  from  the  proof  of  Lemma  5.1.1.  If  we  let  (g-0,  denote  the  estimate  of  the  lower  bound 


14 


of  the  Gerschgorin  interval  of  row  i  ofAj,  then  from  the  proof  of  Lemma  (5.1.1), 

(£'+1),  =  igJ)<  +  I  (aj)i  |  1  -  .  i=7+l. 

For  the  entire  algorithm,  this  requires  approximately  n2/ 2  each  additional  multiplications  and 
additions.  To  begin  this  process,  the  Gerschgorin  bounds  of  the  original  matrix  A  must  be  calcu¬ 
lated,  which  costs  an  additional  n2  additions.  Thus  the  total  costs  of  the  modifications  to  the 
Cholesky  factorization  discussed  in  the  section  are  2n 2  additions  and  n2ll  multiplications. 

We  should  mention  that  the  strategy  for  preserving  positive  definiteness  that  we  discuss  in 
Section  5.2  will  often  cause  the  additional  costs  given  in  this  section  to  be  reduced  considerably. 

5 2  The  Strategy  for  Not  Perturbing  Positive  Definite  Matrices 

In  this  section  we  introduce  our  strategy  for  assuring  that  our  modified  Cholesky  decompo¬ 
sition  does  not  perturb  an  already  positive  definite  matrix,  while  still  guaranteeing  that  if  the 
matrix  is  not  positive  definite,  then  the  amount  that  is  added  to  the  diagonal  is  not  too  large.  The 
strategy  is  quite  simple.  We  divide  our  decomposition  algorithm  into  two  phases.  In  the  first 
phase,  we  apply  the  standard  Cholesky  decomposition  (the  version  described  in  Section  3  where 
we  make  a  rank -one  modification  to  the  remaining  submatrix  at  each  iteration)  for  A:>0  iterations, 
stopping  at  the  first  occasion  that  the  next,  k+\u  iteration  would  cause  any  diagonal  element  in 
the  next  remaining  submatrix  A*+2  to  become  non-positive.  At  this  point  we  know  that  the 
current  submatrix  A^+i,  as  well  as  the  original  matrix  A ,  is  not  positive  definite.  We  then  switch 
to  the  second  phase,  where  we  apply  the  modified  Cholesky  decomposition  algorithm  described 
in  Section  5.1  for  the  remaining  n-k  iterations  of  the  decomposition. 

If  the  original  matrix  A  is  numerically  positive  definite,  then  this  strategy  results  in  the  nor¬ 
mal  Cholesky  decomposition  being  performed  throughout.  If  A  is  not  positive  definite,  then  this 
strategy  results  in  the  normal  Cholesky  decomposition  being  performed  for  k  e  [0.  n-2] 
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iterations, 

followed  by  the  application  of  the  modified  Cholesky  decomposition  to  Ak+U  which  results  in  the 
Cholesky  decomposition  of  Ak+\+E  for  some  non-negative  diagonal  matrix  £.  The  overall  result 
is  the  Cholesky  decomposition  of  A+E ,  where  £  is  £  augmented  with  zeroes  in  the  first  k  diago¬ 
nal  positions  (modulo  pivoting). 

The  crucial  question  is  "how  large  is  1 1£  1 1«,,  and  hence  1 1£  1 1«?”.  Section  5.1  gives  a 
bound  for  1 1  £  1 1  „  that  depends  on  the  sizes  of  the  elements  of  /4i+1.  In  Theorem  5.2.1,  we  show 
that  our  two-phase  strategy  assures  that  no  element  in  /4*+!  has  grown  by  more  than  the  value  of 
the  largest  diagonal  element  element  in  A .  This  in  turn  means  that  our  decomposition  still 
achieves  a  good  bound  on  |  [  £  1 1  „  in  terms  of  the  original  matrix  A . 

Theorem  5.2.1.  Let  A  e  Rnxn  ,  and  let  y  =  max 
{  |  A,i  |,1  <i  <n  } ,  q  =  max  {  |  AtJ  |  ,  1  <i  <j  <n  }.  Suppose  we  perform  the  standard  Chole¬ 
sky  decomposition  as  described  in  Section  3  for  k  >  1  iterations,  yielding  the  remaining  principal 
submatrix  Ak+\  e  -*)*<«-*)  (whose  elements  are  denoted  (Ak+\),j  ,  £+1  <i,j  <n),  and  let  y  = 

max  {  I  (Ak+i)tl  |  ,  k+1  <i  <n  }  and  £  =  max  {  |  (Ak+\)ij  | ,  k+\  <i  <j  <n  }.  Then  if 

(Ak+\)u  >Q,k+l<i  <n,  then  y<y  and  £<^  +  y. 


Proof:  Let  A  = 


B  CT 
CF 


,  where  B  e  Rkxk  ,  C  e  R<-n  k')xk  ,F  €  After  k  iterations 


of  the  Cholesky  factorization,  the  first  k  columns  of  the  Cholesky  factor  L  have  been  determined; 


denote  them  bv 


where£e/?*x*  is  triangular  and  M  e  *)x*.  Then 


B  =L  L7  ,C  =M  17  ,  and F  =M  MT  +Ak+\ . 


(5.2.1) 
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From  (5.2.1),  Fu  =  ||  Mrowi  ||22  +  (At+i)u  ,k+l<i  <n  ,  so  that  from  Fa  <yand  G4*+i)«  SO, 

\\Mrowi  ||22<y.  (5.2.2) 

Thus  for  any  off-diagonal  element  oM*+i ,  (5.2.1),  (5.2.2)  and  the  definition  of  £  imply 

(Ak+l)tJ  <  F,j -(Mrowi)(.Mrowj)T<Z,+ y.  (5.2.3) 

which  shows  £<£  +  y.  Also  for  all  the  diagonal  elements  of  A*+i,  G4*+i)u  SO,  (5.2.1)  and  the 
definition  of  y  imply 

0<(Ak+l)u<Fu  <y.  (5.2.4) 

which  shows  y  <  y  and  completes  the  proof.  □ 

We  note  that  the  result  of  Theorem  5.2.1  is  independent  of  the  diagonal  pivoting  strategy 
that  is  used.  We  also  note,  however,  that  the  technique  of  proof  of  Theorem  5.2.1  actually  shows 
that  the  largest  off-diagonal  element  in  A*+i  is  at  most  equal  to  the  largest  off-diagonal  in  F  plus 
the  largest  diagonal  in  F ,  where  F ,  as  defined  in  the  proof  of  Theorem  5.2.1,  is  the  diagonal  sub¬ 
matrix  of  A  that  corresponds  to  Ak+ Thus  a  pivoting  strategy  that  uses  the  larger  diagonal  ele¬ 
ments  as  pivots  in  the  first  phase  will  limit  the  growth  in  the  off-diagonal  of  Aj+\  even  more  than 
is  indicated  by  Theorem  5P  • .  Our  phase  one  algorithm  pivots  the  largest  remaining  diagonal 
element  to  the  top,  and  thus  is  likely  to  have  this  effect  of  further  limiting  element  growth. 

The  possibility  of  incorporating  this  two-phase  strategy  into  the  method  of  Gill,  Murray, 
and  Wright  [1981]  is  discussed  in  the  next  section. 
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5.3  The  Complete  New  Algorithm 

We  have  now  presented  all  the  main  parts  of  out  new  modified  Cholesky  decomposition 
algorithm.  An  outline  of  the  complete  algorithm  is  given  in  Algorithm  5.3.1,  and  a  fully  detailed 
description  is  given  in  Appendix  I.  To  summarize,  the  first  phase  of  the  algorithm  applies  the 
standard  Cholesky  decomposition,  using  a  diagonal  pivoting  strategy  that  pivots  the  largest 
remaining  diagonal  element  to  the  top  left.  This  phase  ends  when  the  next  iteration  of  the  stan¬ 
dard  Cholesky  decomposition  would  cause  any  diagonal  element  in  the  remaining  submatrix  to 
become  non-positive.  In  the  second  phase,  the  modified  Cholesky  decomposition  described  in 
Section  5.1  is  applied  to  the  remaining  submatrix.  This  phase  determines  what  to  add  to  the  diag¬ 
onal  at  each  iteration  from  the  lower  Gerschgorin  bound  of  the  pivot  row,  and  pivots  based  upon 
estimates  of  these  lower  Gerschgorin  bounds. 

Three  additional,  relatively  minor  features  have  been  incorporated  mto  Algorithm  5.3.1  to 
guard  against  the  resultant  A  +E  being  singular  or  very  ill-conditioned.  First,  the  switch  to  phase 
two  is  made  when  any  diagonal  element  of  the  remaining  submatrix  would  become  less  than  xy, 
rather  than  less  than  zero  as  is  discussed  in  Section  5.2.  Here  y  is  again  the  maximum  diagonal  of 
A  ,  and  x  is  a  small  constant  (we  choose  x  =  (macheps)iri).  This  means  we  may  perturb  a  positive 
definite  matrix  if  its  condition  number  is  greater  than  1/x.  Second,  in  phase  two,  to  assure  that 
A  +£  is  positive  definite  rather  than  positive  semi-definite,  we  set  (using  the  notation  of  Section 
5.1)  each 

5 j  =  max  (0, -a,  +max{  ||a,  1 1 1  ,xy},  5y_i} 

where  the  x  y  term  is  new.  This  causes  the  bound  (5. 1 .3)  on  1 1  £  1 1  „  to  increase  a  tiny  bit  to 

|  \E  |  j„  <  Maxadd-GCT  +  xy  .  (5.3.1) 

but  in  conjunction  with  the  preceding  change,  allows  us  to  bound  the  condition  number  of  A  +£. 
Finally,  at  the  final  iteration  of  phase  two,  when  only  a  2x2  submatrix  An-\  remains,  we  use  a  dif¬ 
ferent  strategy  :  we  calculate  the  eigenvalues  and  of  A„_i,  and  if  X/0  <  x  max(XA,  ,y},  we 
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Algorithm  5  J.l  --  Modified  Cholesky  Decomposition 

Given  A  <=  Rnxn  symmetric  and  x  (e.g.  x=  (macheps)1/3), 
find  factorization  L  Lr  of  A  +£  ,  £  >  0 

Y  :=  max  |  Au  \  ;  1 

(*  Phase  One,  A  potentially  positive  definite  *) 

While  j  <n  do 

Pivot  on  maximum  diagonal  of  remaining  submatrix 
If  min  {  An  -  }  <xy 

then  go  to  Phase  Two 

else  perform  j,h  iteration  of  standard  Cholesky  factorization  and  increment  j 

(*  Phase  Two,  A  not  positive  definite  *) 

k  :=  j  -  1  (*  k  =  number  of  iterations  performed  in  Phase  One  *) 

Calculate  lower  Gerschgorin  bounds  of  A*^i 
For  j  :=  £+1  to  n-  2  do 

Pivot  on  maximum  lower  Gerschgorin  bound  estimate 
Calculate  £;/  and  add  to  A J; 

(*  eji  =min(  O’-Ajj  +max{  £  |  Atj  | ,  x  y}  ,  £;_i  j-i  }  *) 

update  Gerschgorin  bound  estimates 
perform  jlh  iteration  of  factorization 
complete  factorization  of  final  2x2  submatrix  using  its  eigenvalues 


choose  5„_i  so  that  the  lz  condition  number  of  A„_|  +5/  =  1/t  (we  also  require  Xto+5n_j  >  Ty). 
This  generally  gives  a  smaller  value  of  8*_i  than  the  Gerschgorin  circle  theorem  based  strategy 
would,  and  in  theory  it  is  straightforward  to  show  that 

1 8„_i  |  <  -Xlo  +  -pL-  (khl -hc)<  Maxadd-GCT  +  -^L  y  (5.3.2) 

since  ~Xlo  <  Maxadd-GCT  and  Xhl  -Xio  <2  ( Maxadd-GCT  +y). 

The  theoretical  properties  of  our  full  algorithm  are  summarized  in  Theorem  5.3.2. 

Theorem  5J.2.  Let  A  ,  y,  and  ^  be  defined  as  in  Theorem  5.2.1,  suppose  we  apply  the  modified 
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Cholesky  factorization  algorithm  in  Appendix  I  to  A  ,  resulting  in  the  factorization  ££rof  A+£. 
If  .4  is  positive  definite  and  at  each  iteration,  Ljj  2>xy,  then  £  =  0.  Otherwise,  £  is  a  non¬ 
negative  diagonal  matrix,  with 

| !  £  1 1  -  S  Gersch  +  ( Gersch  +  y)  (5.3.3) 

where  Gersch  is  the  maximum  of  the  negative  of  the  lower  Gerschgcnn  bounds  of  A*+i  that  are 
calculated  at  the  start  of  Phase  Two.  If  k=Q  then 

Gersch  =Maxadd-GCT  <y+(n-l)£  (5.3.4) 

where  Maxadd-GCT  is  given  by  (5. 1.1-2),  otherwise 

Gersch  <  [n  -(*+1)]  (y+q).  (5.3.5) 

Proof:  Immediate  from  Theorem  5.1.2,  Theorem  5.2.1,  and  equations  (5.3. 1-2).  □ 

It  is  also  possible  to  produce  an  upper  bound  on  the  condition  number  of  A  +E ,  of  the  same 
son  that  is  provable  for  the  Gill,  Murray,  and  Wright  [1981]  algorithm.  The  key  properties 
needed  for  this  are  that  1 1  £  1 1 ,  and  hence  max {£,-,■ },  is  bounded  above,  that  min {£,; )  is  bounded 
below  (by  \'xy),  and  that  |  LtJ  \  <  L„  for  all  1  <j  <i<n.  (The  final  property  comes  from  diagonal 
pivoting  and  the  look-ahead  property  in  phase  one,  and  from  the  Gerschgorin  bound  strategy  for 
choosing  8 y  in  phase  two.)  The  bound  on  the  condition  number  that  one  can  obtain  is  of  mainly 
theoretical  interest,  since  it  is  exponential  in  n ;  the  computational  results  of  Section  6  show  that 
the  condition  number  of  A  is  bounded  above  by  about  1/T  in  practice. 

We  note  that  our  two  phase  strategy  could  also  be  incorporated  into  the  method  of  Gill, 
Murray,  and  Wright  [1981],  and  that  this  would  result  in  a  significant  improvement  in  their  upper 
bound  on  [  |  £  1 1  This  could  be  done  by  using  the  same  two  phase  structure,  and  replacing  our 
phase  two  by  their  modified  Cholesky  decomposition.  If  this  were  done,  their  algorithm  could 
simply  choose  (32  =  i,  '<(n  -k)2-\  in  phase  two,  rather  than  the  maximum  of  this  quantity  and  y 
(where  q  and  y  are  defined  as  in  Theorem  5.2.2)  because  it  would  know  that  it  is  dealing  with  a 
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non-positive  definite  matrix.  Hence  the  resultant  method  would  achieve  the  bounds  (4.3-4)  if  it 
switched  to  phase  two  immediately,  and 

WE  |U  ^  4 (n -£))•+  2 7  <  4(n-k)£  +  y)  +  2y 

otherwise.  This  would  be  a  significant  improvement  over  the  current  bound  (4.5),  although  it  is 
still  inferior  to  (5.3. 3-5). 


Our  new  algorithm  meets  our  goal  of  not  significantly  increasing  the  cost  of  the  standard 
Choleskv  decomposition,  which  is  about  n2/6  each  additions  and  multiplications.  The  additional 
costs  of  the  modified  factorization  are  (n-k)2  additions  to  calculate  the  Gerschgorin  bounds  of 
Ak+\  at  the  start  of  phase  two,  (where  k  is  the  number  of  iterations  performed  in  phase  one), 
(n-k)2/ 2  additions  to  calculate  the  /]  norms  of  the  pivot  rows  during  phase  two,  and  at  most 
(n-k)2/ 2  each  multiplications  and  additions  to  update  the  Gerschgorin  bounds  during  phase  two. 
In  addition  there  is  a  small  multiple  of  n-k  additional  work.  (The  strategy  for  precalculating  the 
new  diagonal  during  phase  one,  in  order  when  to  determine  when  to  switch  to  phase  two,  only 
costs  a  small  multiple  of  n  operations  as  long  as  the  precalculated  values  are  stored  and  used 
when  phase  one  is  continued.)  Thus  the  total  additional  cost  of  the  modified  Choleskv  decompo¬ 
sition  at  most  2 n2  additions  and  n2/ 2  multiplications,  in  the  case  when  phase  two  is  started 
immediately  (k=0).  In  many  cases  in  our  experience,  k  is  close  to  n  so  the  additional  costs  are 
very'  small. 


Finally,  we  include  a  small  example  to  demonstrate  the  performance  of  the  new  modified 
Cholcsky  algorithm.  Consider  the  matrix  used  by  Gill,  Murray,  and  Wright  [1981]  to  illustrate 
their  modified  Cholcsky  factorization. 


A  = 


1  1  2 

1  1  3 

2  3  1 


Our  new  algorithm  will  proceed  as  follows.  At  the  first  iteration,  no  pivoting  is  performed  in 

A  T1 

phase  one,  and  then  the  algorithm  immediately  switches  to  phase  2  because  A  33  -  <  0.  The 

A  u 
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Gerschgorin  intervals  of  A  are 

[-2. 4],  [-3, 5]  and  [-4,  6], 


The  row  with  the  maximum  lower  Gerschgorin  bound  is  also  row  1,  so  no  pivoting  is  required  in 
this  iteration  for  phase  2  either.  The  modified  Cholesky  algorithm  then  choses  5  =  2=  -(Ger¬ 
schgorin  lower  bound  of  row  1),  and  after  the  elimination  step, 


and  the  estimated  Gerschgorin  bounds  are  unchanged.  The  algorithm  now  enters  the  final,  2x2 
submatrix  stage.  The  eigenvalues  of  A  2  are  (-2.2196,  2.5538),  so  that  82  =2.2196  and  htotal 
=  2.2196.  Thus  for  the  new  algorithm. 


and  !|£  ||«  =  2.22.  This  is  19c  greater  than  the  most  negative  eigenvalue  of  A  which  is -2.2109. 
(If  we  had  continued  the  Gerschgorin  strategy  for42  rather  than  use  the  eigenvalue  strategy,  52 
would  be  2.67.) 


Using  the  same  matrix  A  ,  the  Gill,  Murray  and  Wright  algorithm  computes 


with  ||  E  1 1»  =  5.01. 


6.  Computational  Results 


We  have  compared  the  performance  of  our  new  modified  Cholesky  factorization  (Algo¬ 
rithm  5.3.1  and  Appendix  I)  to  the  algorithm  of  Gill,  Murray  and  Wright  [1981]  on  a  number  of 
indefinite  test  matrices.  The  measures  we  used  to  assess  the  performance  of  the  algorithms  are  the 
ratios  1 1£  1 1„/  |Xi(A)|,  termed  relative  maxadd,  which  reflect  how  well  the  algorithm  has 
satisfied  the  goal  of  adding  as  little  as  possible  to  the  diagonal  of  A ,  and  the  condition  numbers  of 
,4  +E .  We  already  know  that  the  other  two  goals  stated  at  the  beginning  of  Section  3,  low  cost 
and  not  disturbing  safely  positive  definite  matrices,  are  satisfied  by  both  algorithms. 


We  tested  both  algorithms  on  matrices  of  dimension  25,  50  and  75,  with  eigenvalue  ranges 
of  [-1,  10000],  [-1,  1],  and  [-10000,-1].  For  each  combination  of  dimension  and  eigenvalue 
range,  10  matrices  were  created.  Thus  (the  same)  90  test  problems  that  were  used  to  test  each 
algorithm.  Each  test  matrix  was  created  by  forming  the  product  Q\QiQsD  (.Q\Qz(h)T *  where 
each  Qt  is  a  Householder  matrix  of  the  torm 


Qi  =/ - 


and  each  component  of  each  w  is  randomly  generated  from  a  uniform  distribution  in  the  range 
[-1,  1].  Each  D  is  a  diagonal  matrix  whose  elements  were  randomly  generated  from  a  uniform 
distribution  in  the  desired  eigenvalue  range,  with  the  exception  that  for  the  set  of  test  matrices 
with  eigenvalue  range  [-1,  10000],  one  element  of  D  was  generated  from  the  range  [-1, 0],  thus 
guaranteeing  at  least  one  negative  eigenvalue  in  the  test  matrices  of  that  range. 


The  relative  maxadds  for  the  90  tests  of  each  algorithm  are  shown  in  Figures  1A.C.E, 
2A.C.E  and  3A,C,E  in  Appendix  II.  In  summary,  the  relative  maxadds  for  the  new  algorithm 
were  always  small,  and  sometimes  considerably  superior  to  those  for  the  Gill,  Murray,  and 
Wright  algorithm,  although  this  algorithm's  performance  was  also  good  in  most  cases.  The 
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relative  maxadds  for  the  new  algorithm  ranged  from  1.06  to  2.5,  and  was  below  1.71  for  all  but  5 
of  the  90  cases.  The  relative  maxadds  for  the  Gill,  Murray,  and  Wright  algorithm  ranged  from 
1.6  to  77.8,  distributed  as  follows  among  the  various  groups  of  test  matrices.  For  the  matrices 
with  eigenvalues  in  the  [-1, 10000]  range,  the  relative  maxadds  ranged  from  2.1  to  5.6.  In  the 
[-1,  1]  eigenvalue  range,  the  relative  maxadds  were  in  the  range  4.9  to  77.8,  and  in  the  final 
[-10000,-1]  eigenvalue  range  the  relative  maxadds  ranged  from  1.6  to  5.1.  Comparing  on  a 
problem  by  problem  basis,  the  new  algorithm  performed  from  3.5  to  60.9  times  better  than  the 
Gill,  Murray  and  Wright  method  in  terms  of  the  relative  maxadd  for  the  problems  with  the  [-1,1] 
eigenvalue  range,  and  from  1.3  to  4.2  times  better  for  the  remaining  test  cases. 

Figures  4 A -41  show  the  relative  maxadds  for  the  new  algorithm  only,  to  illustrate  more 
clearly  the  how  close  1 1  £  1 1  „  is  to  Xi(A )  for  this  method.  Also  included  in  Figures  4A-4I  are 
the  results  for  a  version  of  the  new  algorithm  that  differs  only  in  that  bases  its  pivots  at  each  itera¬ 
tion  of  phase  two  upon  the  actual  Gerschgorin  bounds  rather  than  their  estimates.  The  additional 
cost  of  calculating  these  bounds  is  about  (n-k)3/ 3,  or  at  most  n3/ 3,  additional  additions.  The 
results  in  Figure  4  show  that  pivoting  on  the  exact  Gerschgorin  bounds  leads  to  some  improve¬ 
ment  in  the  size  of  relative  maxadd ,  but  we  do  not  consider  the  improvements  sufficient  to  war¬ 
rant  the  extra  cost  in  general. 

The  condition  numbers  of  A+E  for  the  two  methods  are  given  in  Figures  1B,D,F,  2B,D,F 
and  3B.D.F  in  Appendix  II.  Basically,  both  methods  produced  acceptably  conditioned  matrices 
in  all  cases.  The  conditions  numbers  for  the  matrices  produced  by  the  new  method  varied  from 
10'  to  106,  whereas  the  condition  numbers  for  the  Gill,  Murray  and  Wright  method  varied  from 
10'  to  108.  The  condition  numbers  for  the  new  method  are  sometimes  directly  related  to  the  final 
step  of  the  algorithm,  which,  if  it  increases  |  j£  ]  |„,  does  so  by  the  amount  necessary  to  make 
the  final  2x2  submatrix  positive  definite  with  condition  number  x.  In  our  test  cases,  the  tolerance 
x  was  ( macheps  )1,3,  or  roughly  10-5-2  on  the  Sun  3/75  used  for  these  tests.  This  accounts  for  the 
condition  numbers  of  almost  106  in  all  the  cases  where  the  final  step  increased  ||£  ||„,. 
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Decreasing  this  tolerance  generally  was  found  to  decrease  the  condition  number,  usually  without 
appreciably  increasing  1 1 E  1 1 

Interestingly,  in  the  cases  where  the  new  algorithm  produced  the  most  significant  improve¬ 
ments  in  relative  maxadds ,  the  test  problems  with  the  [-1,  1]  eigenvalue  range,  it  also  produced 
much  better  conditioned  matrices  than  the  Gill,  Murray,  and  Wright  algorithm.  For  this  test  set, 
the  ratios  of  the  Gill,  Murray  and  Wright  condition  numbers  to  the  condition  numbers  of  the  new 
algorithm  were  between  102  and  104  for  n  =  25,  between  104  and  10s  forn  =50,  and  between  10s 
and  107  for  n  =  75.  For  the  other  two  eigenvalue  ranges,  the  ratios  of  the  condition  numbers  pro¬ 
duced  by  the  two  algorithms  all  varied  by  at  most  2  orders  of  magnitude,  with  the  condition 
numbers  for  the  new  algorithm  consistently  higher  for  the  test  problems  in  the  [-1,  10000]  eigen¬ 
value  range,  and  the  Gill,  Murray  and  Wright  condition  numbers  usually  higher  for  the  test  prob¬ 
lems  in  the  [-10000,-1]  range. 

Finally,  Figures  5A,B  in  Appendix  II  contain  the  test  results  for  a  different  set  of  matrices 
of  dimension  n  =  25  with  eigenvalue  range  [-1,  10000].  The  difference  between  these  test 
matrices  and  the  ones  used  in  figures  1A,B  is  that  these  matrices  were  created  to  have  at  least  3 
negative  eigenvalues,  whereas  the  original  test  problems  in  the  [-1,  10000]  range  were  created 
with  at  least  1  negative  eigenvalue.  What  is  interesting  about  the  results  of  this  new  test  set  is 
that  on  one  particular  matrix  out  of  the  10,  the  new  algorithm  perforens  significantly  worse  than 
the  Gill,  Murray  and  Wright  algorithm.  (This  phenomenon  did  not  occur  with  the  test  sets  of  size 
50  or  75  in  this  range  with  3  negative  eigenvalues,  so  we  have  not  included  this  data).  The  poor 
behavior  occurred  when  the  algorithm  was  at  the  (n-4)'A  iteration,  so  we  created  a  4x4  matrix 
with  similar  characteristics  that  illustrates  the  problem  even  more  markedly 


The  matrix 
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1890.3  -1705.6  -315.8  3000.3 

-1705.6  1538.3  284.9  -2706.6 
-315.8  284.9  52.5  -501.2 

3000.3  -2706.6  -501.2  4760.8 

has  eigenvalues  -0.378,  -0.343,  -0.248,  and  8242.869.  The  first  few  steps  performed  by  the  new 
algorithm  are  as  follows: 

1.  Interchange  row  and  column  4  with  row  and  column  1,  because  A  4  4  is  the  maximum  diagonal 
element. 

At, 

2.  Switch  to  phase  2  because  A  3 1  — <  0. 

A  1,1 

3.  Calculate  the  lower  Gerschgorin  bounds  (-1447.3,  -3158.8,  -1049.4,  -3131.4),  and  since 
-Glow 3  is  the  maximum  value,  interchange  row  and  column  3  with  row  and  column  1. 

4.  Add  (-Glowpivotrow )  =  1049.4  to  A  u. 

At  this  point  in  the  computation,  the  new  algorithm  has  already  added  much  more  to  the 
diagonal  than  is  necessary'  to  make  A  positive  definite.  From  this  point  on  it  doesn’t  increase 
1 1£  j  1 00,  so  that  the  final  value  of  1 1£  | )«  is  1049.4.  On  the  other  hand,  the  Gill,  Murray,  and 
Wright  algorithm  produces  |  \E  \  |„  =  1.01.  This  behavior  occurs  because,  at  the  first  iteration, 
the  Gill,  Murray,  and  Wright  algorithm  pivots  on  the  maximum  diagonal  element  and  then  adds 
nothing  to  the  diagonal,  which  after  elimination  results  in  a  3x3  submatrix  all  of  whose  entries 
have  absolute  value  less  than  0.52.  This  is  guaranteed  to  then  lead  to  a  small  |  |E  1 1  (Indeed, 
if  our  algorithm  performed  the  same  first  step  as  the  Gill,  Murray,  Wright  algorithm  and  then  pro¬ 
ceeded  as  usual,  it  would  produce  ||£  ||.  =  0.665.) 

The  essential  characteristic  of  this  example  is  that  A  is  equal  to  a  large  symmetric  rank  one 
matrix  plus  a  small  indefinite  matrix.  Thus,  if  nothing  is  added  to  A  l  t  at  the  first  iteration,  the 
remaining  submatrix  after  the  elimination  has  very  small  elements,  and  1 1£  1 1  is  small.  The 
Gill,  Murray,  and  Wright  algorithm  will  usually  outperform  ours  on  matrices  of  this  type.  We 
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have  experimented  with  modifications  to  our  algorithm  that  perform  well  for  this  case,  but  all  of 
them  resulted  in  degradation  of  our  algorithm’s  performance  in  other  cases.  Since  the  case  only 
occurred  once  in  the  120  test  cases  discussed  in  this  section,  we  would  hope  that  it  is  not  common 
in  practice. 


7.  Conclusions 


We  have  presented  a  new  modified  Cholesky  factorization  algorithm  that  does  a  good  job 
of  meeting  the  objectives  outlined  at  the  start  of  Section  3.  It  is  based  upon  two  new  techniques, 
the  use  of  Gerschgorin  circle  theorem  bounds  to  decide  how  much  to  add  to  the  diagonal,  and  the 
use  of  a  two  phase  structure  to  differentiate  between  positive  definite  and  non-positive  definite 
matrices.  Given  a  symmetric  matrix  A  ,  the  factorization  produces  a  Cholesky  factorization  of  a 
positive  definite  matrix  A+E.  Its  cost  is  at  most  In2  additions  and  n2l 2  multiplications  more 
than  the  standard  Cholesky  factorization,  and  it  does  not  perturb  safely  positive  definite  matrices 
A  .  Its  theoretical  bound  on  1 1 E  \  |  is  a  factor  of  n  lower  than  for  the  Gill,  Murray,  and  Wright 
[1981]  method.  In  computational  tests  on  non-positive  definite  matrices,  it  virtually  always  pro¬ 
duces  a  smaller  |  \E  [  [  than  the  method  of  Gill,  Murray,  and  Wright,  and  the  conditioning  of 
A+E  is  always  quite  acceptable.  On  the  class  of  test  problems  where  the  Gill,  Murray,  and 
Wright  algorithm  had  the  most  difficulty,  those  with  eigenvalue  range  [-1,1],  the  decreases  in 
I '  £  1 1  and  in  the  condition  number  of  A  +E  are  both  substantial. 

We  have  not  tested  the  effect  of  substituting  our  new  modified  Cholesky  factorization  for 
that  of  Gill,  Murray,  and  Wright  in  optimization  algorithms.  The  most  common  optimization  test 
problems  have  small  n  and  few  if  any  indefinite  iterations,  so  probably  there  would  be  little  effect 
on  these.  The  new  algorithm  might  make  a  difference  on  problems  where  n  is  larger  and  there  is 
some  indefiniteness.  In  our  opinion,  the  biggest  advantage  of  the  new  method  for  optimization 
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purposes  is  its  improved  theoretical  bound  on  |  \E  1 1  and  the  corresponding  reduction  in  | \E  1 1 
that  has  been  observed  in  practice.  These  properties  guard  against  overflows  during  the  factoriza¬ 
tion,  and  against  steps  ( A  +£)_1V/  (x)  that  are  far  too  small. 

In  addition,  the  new  algorithm  leads  to  an  easy  implementation  of  trust  region  methods  for 
optimization,  because  1 1  £  1 1  is  generally  within  a  factor  of  1.5  of  the  smallest  eigenvalue  ) 
of  A  .  By  first  calculating  E ,  then  replacing  A  with  A  +{  |  ]  £  1 1 )/  if  £  *0,  and  then  using  the  trust 
region  method  for  positive  definite  matrices,  one  will  usually  get  the  solution  to  the  exact,  possi¬ 
bly  indefinite  trust  region  problem  without  using  any  other  special  provisions  for  dealing  with 
non-positive  definite  matrices.  We  have  already  used  the  factorization  successfully  in  this  con¬ 
text.  If  there  are  other  computational  algorithms  where  a  crude  estimate  of  the  most  negative 
eigenvalue  of  a  matrix  is  useful,  either  by  itself  or  as  a  starting  estimate  of  some  iterative  pro¬ 
cedure,  then  this  factorization  may  provide  a  good  way  to  find  it. 


Acknowledgement 


We  than  Eric  Van  Vleck  for  performing  the  early  computational  experiments  with  the  new 
method. 


8.  References 


J.  E.  Dennis  Jr.  and  R.  B.  Schnabel  [1983],  Numerical  Methods  for  Nonlinear  Equations  and 
Unconstrained  Optimization,  Prentice-Hall,  Englewood  Cliffs,  New  Jersey. 

P.  E.  Gill  and  W.  Murray  [1974],  "Newton-type  methods  for  unconstrained  and  linearly  con¬ 
strained  optimization".  Mathematical  Programming  28,  pp.  311-350. 

P.  E.  Gill,  W.  Murray,  and  M.  H.  Wright  [1981],  Practical  Optimization,  Academic  Press,  Lon¬ 
don. 

G.  A.  Shultz,  R.  B.  Schnabel,  and  R.  H.  Byrd  [1985],  "A  family  of  trust  region  based  algorithms 
for  unconstrained  minimization  with  strong  global  convergence  properties",  SIAM  Journal  on 
Numerical  Analysis, 22,  pp.  47-67. 


29 


Appendix  I  —  Complete  Modified  Cholesky  Decomposition  Algorithm 

Given  A  e  Rm  *■  symmetric  (stored  in  lower  triangle)  and  x  (e.g.  x=  (macheps)173), 
find  factorization  L  LT  of  A  +  £  ,  E  >0 

phaseone  :=  true 
y  :=  max  |  Au  \ 

ISiSii 

j  :=  1 

(*  Phase  One,  A  potentially  positive  definite  *) 

While  j  <n  and  phaseone  =  true  do 

(*  Pivot  on  maximum  diagonal  of  remaining  submatrix  *) 
i  :=  index  of  max  Au 

j  Si  Sa 

if  i*jy  switch  rows  and  columns  i  and  j  of  A 

If  min  {Au  -  )  <xy 

then  phaseone  :=  false  (*  go  to  Phase  Two  *) 
else  (*  perform  jth  iteration  of  factorization  *) 

Ljj  =  'JaJj  (*  Ljj  overwrites  Ajj  *) 

For  i  :=  j  +  1  to  n  do 

Lij  :=  Aij  I  Ljj  (*  Lij  overwrites  AtJ  *) 

For  k  :=  j  + 1  to  i  do 

Am  •-  Aik  ~  L^  *  Lkj 

j  :=j  +  l 

(*  end  Phase  One  *) 

(*  Phase  Two,  A  not  positive  definite  *) 

If  phaseone  =  false  then 

k  :=  j  -  1  (*  k  =  number  of  iterations  performed  in  Phase  One  *) 

(*  Calculate  lower  Gerschgorin  bounds  of  A*+i  *) 

For  i  :=  k  + 1  to  n  do 

gi  ■=  t  I  Aij  |+  j  I  Aji  |  -Au 

y=3+i  ;=,+l 

(*  Modified  Cholesky  Decomposition  *) 

For  j  :=k+ 1  ton-2  do 

(*  Pivot  on  maximum  lower  Geiieligorin  bound  estimate  *) 
i  :=  index  of  max  (g, ) 

j  S  i  Sa 

if  i  *j ,  switch  rows  and  columns  i  and  j  of  A 
(*  Calculate  Ejj  and  add  to  diagonal  *) 

normj  :=  £  |  Atj  | 

i=;+l 

8  (*  =  Ejj  * )  =  min  { 0,  -  Ajj  +  max  {  normj ,  T  y)  ,  8 prev  ) 
if  8  >  0  then 

Ajj  :=  Ajj  +  8 

Sprev  :=  8  (*  bprev  will  contain  ||£  ||  -  *) 

(*  update  Gerschgorin  bound  estimates  *) 

If  Ajj  *  normj  then 

temp  :=«i- 1 

for  i  :=  ;  +  1  to  n  do 

g,  :=,?,  +  !  Aij  |  ♦  temp 
(•  perform  jth  iteration  of  factorization  •) 
same  code  as  in  Phase  One 
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(*  final  2x2  submatrix  *)  - 

A  A  If  ^*^1-1 

>  A.*,  :=  eigenvalues  of  .  . 

A*,*- 1  A** 

8  :=  max  { 0 ,  -Xu,  +  x  *  max  { (A*  -  A.*, )  ,Y }  ,8prev) 
if  8  >  0  then 

^d-i.o-i  :=  +  5 

Anjt  '=  AK  ^  +  8 
8 prcv  :=  8 

Lm-I.m-I  :=  'JAh-i.Ah-i  (*  overwrites  Ah-i,„-i  *) 


f-n.o-i  :=  A,,,-)  /L,_ii(l_i  (*  overwrites  A, t._i  *) 
:=  (A,.,  -L,,,-!2)^  (*  overwrites  4.,.  *) 


(*  End  Phase  Two  *) 
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3A.  n=75.  eis.  ranee  f-1, 10000 


3B.  n=75,  eie.  ranee  [-1,10000 
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4E.  n=50.  eig.  range  f-1.11 


4F.  n=50.  eig.  range  [-10000.-11 
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METHODS:  New  Method 

New  Method  with  0(n**3)  pivoting 


(note:  two  curves  are  identical  in  Figures  4  A,D  ) 


Figure  4,  Part  I  —  Relative  Maxadds  for  Two  Versions  of  the  New  Method 
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41.  n=/5.  em.  ranee  [-10000.-1] 


METHODS 
New  Method 

New  Method  with  0(n**3)  pivoting 


Figure  4,  Pan  II  --  Relative  Maxadds  for  Two  Versions  of  the  New  Method 
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METHODS 
Gill,  Murray  &  Wright 
New  Method 


Figure  5  -  Performance  of  Existing  and  New  Methods  on  a  Test  Set  with  3  Negative  Eigenvalues 
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